2,032 research outputs found

    Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

    Full text link
    This paper is concerned with the interplay between statistical asymmetry and spectral methods. Suppose we are interested in estimating a rank-1 and symmetric matrix MRn×n\mathbf{M}^{\star}\in \mathbb{R}^{n\times n}, yet only a randomly perturbed version M\mathbf{M} is observed. The noise matrix MM\mathbf{M}-\mathbf{M}^{\star} is composed of zero-mean independent (but not necessarily homoscedastic) entries and is, therefore, not symmetric in general. This might arise, for example, when we have two independent samples for each entry of M\mathbf{M}^{\star} and arrange them into an {\em asymmetric} data matrix M\mathbf{M}. The aim is to estimate the leading eigenvalue and eigenvector of M\mathbf{M}^{\star}. We demonstrate that the leading eigenvalue of the data matrix M\mathbf{M} can be O(n)O(\sqrt{n}) times more accurate --- up to some log factor --- than its (unadjusted) leading singular value in eigenvalue estimation. Further, the perturbation of any linear form of the leading eigenvector of M\mathbf{M} --- say, entrywise eigenvector perturbation --- is provably well-controlled. This eigen-decomposition approach is fully adaptive to heteroscedasticity of noise without the need of careful bias correction or any prior knowledge about the noise variance. We also provide partial theory for the more general rank-rr case. The takeaway message is this: arranging the data samples in an asymmetric manner and performing eigen-decomposition could sometimes be beneficial.Comment: accepted to Annals of Statistics, 2020. 37 page

    Information Recovery from Pairwise Measurements

    Full text link
    A variety of information processing tasks in practice involve recovering nn objects from single-shot graph-based measurements, particularly those taken over the edges of some measurement graph G\mathcal{G}. This paper concerns the situation where each object takes value over a group of MM different values, and where one is interested to recover all these values based on observations of certain pairwise relations over G\mathcal{G}. The imperfection of measurements presents two major challenges for information recovery: 1) inaccuracy\textit{inaccuracy}: a (dominant) portion 1p1-p of measurements are corrupted; 2) incompleteness\textit{incompleteness}: a significant fraction of pairs are unobservable, i.e. G\mathcal{G} can be highly sparse. Under a natural random outlier model, we characterize the minimax recovery rate\textit{minimax recovery rate}, that is, the critical threshold of non-corruption rate pp below which exact information recovery is infeasible. This accommodates a very general class of pairwise relations. For various homogeneous random graph models (e.g. Erdos Renyi random graphs, random geometric graphs, small world graphs), the minimax recovery rate depends almost exclusively on the edge sparsity of the measurement graph G\mathcal{G} irrespective of other graphical metrics. This fundamental limit decays with the group size MM at a square root rate before entering a connectivity-limited regime. Under the Erdos Renyi random graph, a tractable combinatorial algorithm is proposed to approach the limit for large MM (M=nΩ(1)M=n^{\Omega(1)}), while order-optimal recovery is enabled by semidefinite programs in the small MM regime. The extended (and most updated) version of this work can be found at (http://arxiv.org/abs/1504.01369).Comment: This version is no longer updated -- please find the latest version at (arXiv:1504.01369

    Trip Prediction by Leveraging Trip Histories from Neighboring Users

    Full text link
    We propose a novel approach for trip prediction by analyzing user's trip histories. We augment users' (self-) trip histories by adding 'similar' trips from other users, which could be informative and useful for predicting future trips for a given user. This also helps to cope with noisy or sparse trip histories, where the self-history by itself does not provide a reliable prediction of future trips. We show empirical evidence that by enriching the users' trip histories with additional trips, one can improve the prediction error by 15%-40%, evaluated on multiple subsets of the Nancy2012 dataset. This real-world dataset is collected from public transportation ticket validations in the city of Nancy, France. Our prediction tool is a central component of a trip simulator system designed to analyze the functionality of public transportation in the city of Nancy

    When Shopbots Meet Emails: Implications for Price Competition on the Internet

    Get PDF
    The Internet has dramatically reduced search costs for customers through tools such as shopbots. The conventional wisdom is that this reduction in search costs will increase price competition leading to a decline in prices and profits for online firms. In this paper, we provide an argument for why in contrast to conventional wisdom, competition may be reduced and prices may rise as consumer search costs for prices fall. Our argument has particular appeal in the context of the Internet, where email targeting and the ability to track and record customer behavior are institutional features that facilitate cost effective targeted pricing by firms. We show that such targeted pricing can serve as an effective counterweight to keep average prices high despite the downward pressure on prices due to low search costs. Surprisingly, we find that the effectiveness of targeting itself improves as search costs fall; therefore prices and profits can increase as search costs fall. The intuition for our argument is as follows: Consider a market where consumers are heterogeneous in their loyalty as well as their cost per unit time to search. In the brick and mortar world, it takes consumers a very large amount of time to search across multiple firms. Therefore few customers will search in equilibrium because the gains from search will be relatively small compared to the cost of search. In such a market, a firm will not be able to distinguish whether its customers bought from it due to their high loyalty or due to their unwillingness to search for low prices because of the high search cost. On the Internet, the amount of time to search across multiple stores is minimal (say zero). Now irrespective of their opportunity cost of time, all consumers can search because the time to search is negligible. If in spite of this, a consumer does not search in this environment, she is revealing that her loyalty to the firm that she buys from is very high. The key insight is that as search becomes easy for everyone, then lack of search indicates strong customer loyalty and thus can be used as a proxy to segment the market into loyal and price sensitive segments. Thanks to email technology, firms can selectively set differential prices to different customers, i.e. a high price to the loyal segment and a low price to the price sensitive segment, at relatively low cost. The increased competition due to price transparency caused by low search costs can thus be offset by the ability of firms to price discriminate between their loyal (price insensitive) customers and their price sensitive customers. In fact, we find that it can reduce the extent of competition among the firms and raise their profits. Most surprisingly, the positive effect of targeting on prices improves when search costs fall, because firms can learn more about the differences in customer loyalty, thus improving the effectiveness of targeted pricing. The effectiveness of targeted pricing however is moderated by the extent of opt-in by customers who give their permission for firms to contact them directly by email. Our analysis offers interesting strategic insights for managers about how to address the competitive problems associated with low search costs on the Internet: (1) It suggests that firms should invest in better technologies for personalization and targeted pricing so as to prevent the Internet from becoming a competitive minefield that destroys firm profitability. In fact we show that low search costs can facilitate better price personalization and can thus aid in improving the effectiveness of targeted pricing efforts. (2) The analysis also offers guidelines for online customer acquisition efforts. The critical issue for competitive advantage is not in increasing market share per se, but in increasing the loyalty of customers. While a larger share of very loyal customers reduces competitive intensity, surprisingly a larger share of customers who are not very loyal can be a competitive disadvantage. In order for customer acquisition to be profitable, it should be accompanied by a superior product or service that can ensure high loyalty. (3) Investing in online privacy initiatives that assures consumers that their private information will not be abused other than to offer them "deals" is worthwhile. Such assurances will encourage consumers to opt into firm mailing lists. This facilitates successful targeting which in turn ameliorates the competitive threats due to low search costs on the Internet. (4) When the overwhelming majority of customers are satisfied with online privacy, the remaining privacy conscious customers who are not willing to pay a higher price to maintain their privacy will be left out of the market. While this may be of some concern to privacy advocates, it is interesting that total consumer welfare can be higher even if some consumers are left out of the market. Our analysis captures the competitive implications of the interaction between two institutions facilitated by the Internet: Shopbots and Emails. But the research question addressed is more fundamental: What is the nature of competition in an environment with low costs for both consumer search and firm-to-consumer personalized communications? The strategic insights obtained in the paper may be beneficially applied even to offline businesses that can replicate such an environment. For example, offline firms could have websites on which they post prices allowing for easy price comparisons. They could also use tools such as frequency programs to create addressable databases that enable them to communicate with customers by direct mail and email (as many airlines and stores do).

    Scalable Semidefinite Relaxation for Maximum A Posterior Estimation

    Full text link
    Maximum a posteriori (MAP) inference over discrete Markov random fields is a fundamental task spanning a wide spectrum of real-world applications, which is known to be NP-hard for general graphs. In this paper, we propose a novel semidefinite relaxation formulation (referred to as SDR) to estimate the MAP assignment. Algorithmically, we develop an accelerated variant of the alternating direction method of multipliers (referred to as SDPAD-LR) that can effectively exploit the special structure of the new relaxation. Encouragingly, the proposed procedure allows solving SDR for large-scale problems, e.g., problems on a grid graph comprising hundreds of thousands of variables with multiple states per node. Compared with prior SDP solvers, SDPAD-LR is capable of attaining comparable accuracy while exhibiting remarkably improved scalability, in contrast to the commonly held belief that semidefinite relaxation can only been applied on small-scale MRF problems. We have evaluated the performance of SDR on various benchmark datasets including OPENGM2 and PIC in terms of both the quality of the solutions and computation time. Experimental results demonstrate that for a broad class of problems, SDPAD-LR outperforms state-of-the-art algorithms in producing better MAP assignment in an efficient manner.Comment: accepted to International Conference on Machine Learning (ICML 2014

    On the Minimax Capacity Loss under Sub-Nyquist Universal Sampling

    Full text link
    This paper investigates the information rate loss in analog channels when the sampler is designed to operate independent of the instantaneous channel occupancy. Specifically, a multiband linear time-invariant Gaussian channel under universal sub-Nyquist sampling is considered. The entire channel bandwidth is divided into nn subbands of equal bandwidth. At each time only kk constant-gain subbands are active, where the instantaneous subband occupancy is not known at the receiver and the sampler. We study the information loss through a capacity loss metric, that is, the capacity gap caused by the lack of instantaneous subband occupancy information. We characterize the minimax capacity loss for the entire sub-Nyquist rate regime, provided that the number nn of subbands and the SNR are both large. The minimax limits depend almost solely on the band sparsity factor and the undersampling factor, modulo some residual terms that vanish as nn and SNR grow. Our results highlight the power of randomized sampling methods (i.e. the samplers that consist of random periodic modulation and low-pass filters), which are able to approach the minimax capacity loss with exponentially high probability.Comment: accepted to IEEE Transactions on Information Theory. It has been presented in part at the IEEE International Symposium on Information Theory (ISIT) 201
    corecore